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This  report  describes  a program  developed  for  the  conversion 
from  rectangular  to  relational  data  bases.  The  programs 
intent  is  to  facilitate  loading  some  segment  of  a large 
rectangular  file  into  a relational  data  base.  Thus  large 
rectangular  files  which  could  not  easily  be  maintained  as 
relational  data  bases  could  have  some  subet  of  interest 
loaded  into  a relational  data  base  system,  such  as 
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20.  (continued)  ”^el  English,  to  take  advantage  of  the 
more  powerful  relational  query  languages. 


The  inputs  to  the  conversion  routine  are  the  subset 
the  rectangular  file,  basic  information  about  the 
fields  in  the  records,  and  prototype  commands  for 
loading  the  relational  data  base.  The  output  is  a 
file  of  commands  for  loading  the  records  into  the 
relational  data  base. 
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Section  1 


statement  of  Problem 


i 


In  their  paper  urs.  U.  Peter  Buneman  ana  Howard  L. 
Morgan(l)  outline  the  desirability  of  being  able  to 
restructure  subsets  of  large  rectangular  data  bases  into 


relational 

data 

bases 

in  order 

to  make  use  of  the 

tlexibil ity 

of 

the 

interactive 

question-answer ing 

capabil ities 

of 

existing 

relational 

data  base  languages. 

(This  report  uses  KtL  English  as  an  illustration  of  such  a 
language.)  The  basic  problem  is  that  creating  relational 
oata  bases  is  a very  tedious  task.  tor  example,  in  REL 
Englisn  it  it  is  desirea  to  enter  into  the  data  base  tnat 
John  Smith  is  taking  course  IS  201  and  that  his  major  is 
Accounting,  the  following  statements  woula  be  required: 

JOHN  SMITH :=NAME 
IS  201 : =NAME 
ACCuUNTInG: =NAME 
STUDENT  : = RELATION 
MAJOR: =RELATION 

JOHN  SMITH  IS  A STUDENT  OF  IS  201. 

ACCOUNTING  IS  THE  MAJOR  OF  JOHN  SMITH. 

It  is  easy  to  see  that  if  the  information  about  a few 


(1)"Asar  to  KEL:  Ellicient  Relational  Data  bases  from  Very 
Large  Files",  by  Dr.  o.  Peter  buneman  and  Dr.  Howard  L. 
Morgan,  working  Paper  no.  75-01-06,  Dept.  of  Decision 
sciences.  University  of  Pennsylvania. 
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nunared  students  was  to  be  entered  by  hana,  it  could  be  very 
time  consuming.  Tnis  information  may  already  exist  in  a 
rectangular  file,  or  it  could  be  entered  into  one  with 
considerably  more  ease  tnan  entering  it  into  the  relational 
data  case.  This  report  documents  a program  which  given  a 
rectangular  file  and  a minimal  amount  of  information  about 
tne  desired  statements  (assertions)  will  generate  tne 
statements  necessary  to  establish  the  required  relational 
oata  base. 
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Section  2 functional  Design 

A)  t-rogram  Chain 

Task:  given  a rectangular  tile 

eg. 

FIRST-NAME  LAST-NAME  COURSE  MAJOR 

JOhN  SMITH  IS  201  ACCOUNTING 

output  the  assertions  necessary  to  create  a relational  data 
base . 

eg.  (kEL  English) 

STUDENT: =RELAT10N  (1) 

MAJOR :=RELAT ION  (2) 

JOhN  SMITH :=NAME  (3) 

IS  201 : =NAME  (4) 

ACCOUNTING: =NAME  (5) 

JOHN  SMITH  IS  A STUDENT  OF  IS  201.  (6) 

ACCOUNTING  IS  THE  MAJOR  OF  JOHN  SMITH.  (7) 

Assertions  1 and  2 would  not  need  to  be  repeated. 
Assertions  such  as  3,  4,  5,  6 and  7 would  have  to  be 
repeated  with  new  information  from  each  record. 

Tneretore  wnat  would  be  necessary  for  the  program  would 
be  a set  of  text  indicating  the  necessary  assertions, 
eg. 

STUDENT : =RELAT10N$ 

MAJOR: = RELATIONS 

(*F  IRST-NAmE  & LAST-NAME: =NAME$ 

&COURSE : =NAME$ 

& MAJOR: = NAMES 
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&KlKST-NAMb  ScLAfa'i'-NAMh  lb  A S'i'llLibNT  OK  &COllkSL.$ 

& MAJOR  I b Ttib  MAJOR  OF  &FIRST-NAMK  & LAST  -NAME . $ 

In  tnis  example  ot  text,  all  relerences  to  record  field 
names  are  precedea  by  an  & (the  variable  delimiter)  and  each 
assertion  is  ended  with  a $ (the  sentence  delimiter).  both 
tne  variable  delimiter  and  the  sentence  delimiter  used  in 
tne  text  are  specified  in  the  input  data. 

The  program  will  output  all  lines  containing  no 
variable  delimiters.  bach  line  containing  a variable 
delimiter  will  be  output  once  for  each  record,  substituting 
tne  field  values  of  that  record  tor  the  field  names  in  the 
text  (tne  items  preceded  by  a variable  delimiter),  deleting 
all  lead ing  and  trailing  blanks  in  the  field  value. 

besides  the  basic  task  of  generating  the  assertions, 
two  other  features  were  thought  desirable. 

1)  The  program  can  be  set  to  discard  any  generated 
assertions  in  which  a blank  field  is  placed  or  to  substitute 
a given  set  ot  cnaracters  (possibly  the  null  set)  in  place 
ot  the  field.  (Note  tor  RbL  it  was  necessary  to  delete  any 
assertions  in  which  a totally  blank  field  value  was  placed.) 

2)  The  program  can  also  be  set  to  examine  each  character  ot 
the  field,  after  leading  and  trailing  blanks  have  been 
deleted,  and  sets  ot  characters  can  be  specified  to  be 
substituted  tor  single  cnaracters.  (eg.  it  a LISP  based 
system  were  being  used  the  blank  character,  " , would  have 
to  be  replaced  with  "/  •• , i.e.  "Mary  Jane  would  have  to  be 
replaced  with  'Mary/  Jane"). 
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Tne  program  would  also  require  the 
rielos  in  the  record. 


location  of  the 


eg. 


EIELu  NAME 

BEGINNING  COLUMN 

ENDING  COLUMN 

EIkST-NAmE 

1 

15 

LAST-NAME 

16 

3U 

COURSE 

31 

35 

MAJOR 

36 

50 

finally,  tne  program  woula  require  the  records  lor 
which  tne  assertions  are  to  be  generated. 

The  basic  operation  of  the  program  is  as  follows: 

1)  A set  of  code  is  generated  which  locates  all 
nongenerating  sentences  (i.e.  those  containing  no 
variable  delimiters)  in  the  text  and  which  for  all 
generating  sentences  (i.e.  those  containing 
variable  delimiters)  contains  information  about  the 
location,  in  the  record,  of  the  field  values 
referenced  by  the  assertions,  and  about  the  sentence 
segments  to  be  placed  before  or  after  each  field 
value.  Sentence  segments  are  the  portions  of  the 
genereating  statements  which  are  to  be  output  in  tne 
assertions  (eg.  “IS  A STUDENT  Of"). 

2)  Output  all  nongenerating  sentences. 

3)  Given  a record,  use  the  code  developed  in  step  1 to 
generate  tne  required  sentences  and  output  them. 

4)  If  duplicate  assertions  are  to  be  deleted 

a)  sort  all  assertions  into  alphabetic  order. 
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b)  Delete  all  redundant  assertions. 

c)  Resort  remaining  assertions  into  the  order  in 
which  they  were  generated.  (Note:  the  first 
assertion  generated  of  multiply  generated  assertions 
is  the  one  which  is  retained.) 

b)  Program  usage 

tirst,  a rectangular  file  of  the  relevant  subset  of  the 
data  case  is  set  up  in  the  desired  form.  That  is,  all 
information  stored  in  cooed  form  is  changed  to  the  desired 
form  for  use  in  the  relational  data  base  language.  (eg. 
Accounting  may  be  stored  as  Acct.  ) Besides  the  form 
cnange,  this  step  serves  the  purpose  of  isolating  the 
desired  subset  of  the  rectangular  data  base. 

becona,  program  CHAIN  is  run  using  the  output  from  the 
first  step,  the  prototype  text,  the  field  name  directory, 
and  punctuation  information  (see  Section  3)  as  input. 

The  user  will  be  prompted  to  indicate  whetner  or  not 
duplicate  assertions  are  to  be  eliminated.  The  input  text 
tor  program  CHAIN  must  be  in  the  order  required  by  the 
relational  data  base  language  being  used.  (eg.  In  REL, 
data  must  be  declared  as  a NAME  before  its  relation  to 
previously  entered  data  can  be  asserted.)  Therefore  it  the 
deletion  of  redundant  assertions  is  required,  it  is 
accomplished  by  sorting  the  assertions  first  by  the 
assertions  themselves,  suboroered  by  order  of  generation 
number.  A pass  is  then  made  through  this  new  file,  deleting 
all  but  the  tirst  assertion  ot  each  group  ot  redundant 


<1 


P 


asser  t ions . 
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The  remaining  assertions  are  then  resorted  by 
tneir  order  ot  generation  numbers  ana  the  tinal  tile  is 
output  without  the  order  of  generation  numbers  (as  these  are 
not  needed  in  the  input  to  tne  relational  data  base 
language) . 
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Section  3 Operational  Spec i t icat ions 

four  files  are  needed  as  input  tor  the  program. 

1)  The  tile  ot  fixed  length  input  records. 

2)  A directory  tile  containing  the  length  (number  ot  ASCII 
characters)  in  an  input  record  and  tne  field  locations 
(beginning  and  ending  character  numbers).  The  form  of  tne 
tile  is 

L 


PI" 

Bl 

LI 

P2" 

B2 

L2 

where  L is  an  integer  equal  to  the  numoer  ot  characters 
(ASCII)  in  one  input  record.  The  Pi  are  the  field  names 
used  in  the  text  (these  names  must  be  quoted).  The  Bi  are 
integers  equal  to  tne  beginning  character  number  ot  fields 
Pi.  The  Li  are  integers  equal  to  the  ending  character 
numoer  of  fields  Pi.  Lacn  item  in  the  tile  mus  tbe 
separated  by  blank(s)  and/or  carriage  return(s)  from  the 
other  items. 

3)  The  third  file  is  tne  text  ot  prototype  assertions, 
where  field  values  .rom  tne  input  records  are  to  be 
substituted,  the  field  names  given  in  the  field  directory 
are  used,  preceded  by  a special  character,  the  variable 
delimiter.  Lach  individual  assertion  is  ended  with  another 
special  character,  the  sentence  delimiter.  These  special 


characters,  are  not  included  in  the  output  assertions. 

4)  Tne  punctuation  tile  contains  information  aoout  the 
special  characters  in  the  text  and  input  records.  The  file 
is  of  the  form 
"V 
"b" 

"Pi"  "P  2"  " P3 ''  ... 

"b"  "A" 

■•Cl"  -Kl- 
" C 2 " " K2 " 


where  V is  the  variable  delimiter  (a  single  character).  S 
is  the  sentence  delimiter  (a  single  character).  The  Pi  are 
single  cnaracters,  which  other  than  the  sentence  delimiter 
and  variable  delimiter  would  denote  the  end  of  a varibale 
name  in  the  text.  If  b in  the  input  is  yes,  all  output 
assertions  into  which  a blank  field  value  is  to  be  placed 
are  discarded,  it  it  is  no  tney  are  kept.  If  b is  no,  then 
item  A is  to  be  included  in  the  input  file;  it  b is  yes, 
then  item  A is  not  to  be  included  in  the  input  tile.  A is  a 
set  of  characters  which  are  to  be  put  in  the  output 
assertions  in  place  of  the  blank  field  values.  The  Ci  are 
the  single  characters,  which  if  they  occur  in  a record  field 
value  are  to  be  replaced  by  the  character  sets  ki  before 
putting  the  field  value  in  the  output  assertions.  Each  item 
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in  tne 

tile 

is 

to  be 

quoted  and  must 

be 

separated 

f rom 

the 

other  i 

. terns 

by 

blank 

(s) 

and/or 

car  r iage 

return ( s) . 

Also 

# i t 

a quote 

; mark 

t 

' , is 

to 

appear 

in  any 

of 

the  items. 

it 

must 

be  input  as  two  quote  marks,  " " . 

once  the  program  has  started  it  will  prompt  the  user 
tor  tne  device  and  tile  names  of  the  input  files.  The 
record  tile  is  the  only  input  file  which  does  not  have  to  be 
on  a directory  device  (in  which  case  it  would  not  have  a 
file  name)  as  the  other  input  files  are  all  passed  twice. 
It  the  user  uses  standard  tile  names  (i.e.  REC . DAT  for  the 
record  file,  FLO. DAT  for  the  field  directory  file,  TEXT . DAT 
for  the  text  tile,  and  PUNC.DAT  for  the  punctuation  file) 
and  they  are  all  on  disk  in  his  area,  he  need  not  provide 
the  device  and  tile  name  for  each  file  individually,  but 
need  only  answer  "yes“  when  asked  if  standard  names  and 
devices  are  used.  If  the  user  is  providing  tne  device  and 
tile  names  in  answer  to  prompts  and  either  a device  or  file 
name  given  is  not  available,  tne  prompt  will  oe  given  again 
until  the  answer  given  is  a device  or  tile  name  which  is 
available.  If  it  is  desired  to  end  such  a cycle,  the  user 
must  "C  back  to  monitor  level  (abort  the  job) . The  user 
will  also  be  prompted  tor  the  device  and  file  name  (it  a 
directory  device  is  specified)  for  the  output  file. 
Finally,  the  user  will  be  prompted  to  indicate  whether  or 
not  duplicate  assertions  should  be  eliminated.  If  the 
system  can  handle  duplicate  assertions  or  the  user  knows 
that  no  duplicate  assertions  will  be  generated,  indicating 
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that  elimination  of  duplicate  assertions  is  not  necessary 
will  save  the  sort-delete-resort  steps. 

In  the  course  of  execution  the  program  will  create  and 
delete  tour  temporary  files:  KECS1.TMP,  KECS2.TMP, 

RECS3.TMP  and  KLCS4.TMP.  For  sample  input,  output  and 
prompting  and  answer  runs  see  Appendix  Al.  Note  that  tne 
assertions  appear  in  the  order  given  in  the  original  text, 
ana  tne  first  occurrence  of  a multiply  generated  assertion 
is  the  one  that  is  retained,  when  duplicate  assertions  are 
el iminatea . 

Errors  in  the  input  information  will  most  often  cause 
an  abort  ending  and  the  print  out  of  a diagnostic  statement. 
The  diagnostic  statements  are 

1)  "FILE  MUST  BE  ON  A DIRECTORY  DEVICE" 

This  is  caused  by  specifying  a device  type  for  the 
text,  punctuation,  or  field  directory  file  which  is  not  a 
directory  device.  This  will  not  cause  an  abort  ending,  but 
will  cause  another  request  tor  the  device  type. 

2)  "VARIABLE  DELIMITER  OF  MORE  THAN  ONE  CHARACTER" 

Variable  delimiter  specified  in  the  punctuation  file  is 
of  more  than  one  character. 

1)  "SENTENCE  DELIMITER  OF  MORE  THAN  ONE  CHARACTER" 

Sentence  delimiter  specified  in  the  punctuation  file  is 
more  than  one  character. 

4)  "PUNCTUATION  OF  MORE  THAN  ONE  CHARACTER  OR  MISSPELLED 
bLANK  DELETION  " 

One  of  the  items  in  the  punctuation  tile,  which  was 


wrr 
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taken  as  a character  denoting  the  end  ot  a field  name  in  tne 
text,  was  more  tiian  one  character. 

5)  " INSUFFICIENT  INFORMATION  IN  PUNCTUATION  F ILF" 

One  of  the  required  items  was  lelt  out  of  the 
punctuation  tile. 

6)  " REPLACEMENT  CHARACTER  OF  IMPkOPER  LENGTH  Ok  BLANK 
oUBSTITUTE  MISSING" 

One  ot  the  items  in  the  punctuation  tile,  which  was 
taken  as  a character  to  be  replaced  it  tound  in  a record 
tield  value,  was  more  than  one  character. 

7)  "ATTEMPT  TO  READ  OVER  THE  END  OF  FILE  file  name" 

Could  have  been  caused  by  leaving  an  item  or  quote  mark 
out  ot  the  punctuation  or  tield  directory  file,  specifying 
an  incorrect  input  record  length,  or  leaving  the  final 
sentence  delimiter  out  of  the  text  file. 

S)  "IMPROPER  COLUMN  NUMBERS  IN  THE  FIELD  NAME  DIRECTORY" 

Cuased  by  a beginning  or  ending  character  number  which 
is  greater  than  the  record  length  or  less  than  0,  or  a 
beginning  character  number  which  is  greater  than  the  ending 
character  number  given  tor  that  field, 
y)  "FIELD  NAME  DIRECTORY  IS  EMPTY" 

No  field  names  are  given  in  the  tield  directory  tile. 

1U)  "IMPROPER  INFORMATION  IN  THE  FIELD  DIRECTORY  FILE" 

Nonnumeric  data  given  as  character  numbers  in  the  field 
directory  file 

11)  "NO  GENERATING  SENTENCES  IN  THE  TEXT  FILE" 

None  ot  the  prototype  assertions  in  the  text  file 
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contain  variable  delimiters. 

12)  " FIELD  NAME  name  NUT  FOUND  IN  FIELD  DIRECTORY" 

The  given  tield  name  was  used  in  the  text,  but  was  not 
found  in  the  lielu  directory. 
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Section  4 bugs  and  Further  Work 

Tnere  are  three  notable  problems  with  the  current 
program.  First  at  cerain  points  in  the  program  in  order  to 
overcome  a system  bug,  it  was  necessary  to  make  a copy  of 
cnaracter  strings  and  process  the  copy,  when  processing  the 
original  would  have  been  preferred 

Tne  secona  problem  was  that  of  reading  in  the  character 
data.  Other  than  reading  in  the  characters  one  at  a time 
(which  is  time  consuming)  or  having  the  input  quoted  (a 
method  used  when  not  inconvenient),  the  only  way  to  read  in 
cnaracter  data  is  to  set  a string  up  as  the  output  channel 
ana  then  perform  a TKANSFILL.  This  function  transfers  all 
data  (in  7 bit  ASCII)  from  the  input  channel  to  the  output 
channel  until  an  end  of  tile  marker  is  encountered  on  either 
channel.  This  function  is  much  more  effiecient  than  reading 
in  the  characters  one  at  a time,  but  if  the  end  of  file 
marker  is  encountered  on  tne  output  channel  first  (i.e.  the 
string  is  filled)  , when  the  next  section  of  input  is  read  in 
the  last  character  which  was  attempted  to  be  transferred  to 
tne  string  has  been  lost.  Since  on  the  current  system 
records  are  separated  by  carriage  returns,  this  was  no 
problem.  However,  it  the  records  were  not  separated  by 
carriage  returns,  the  character  lost  by  using  this  method 
woula  be  part  of  the  next  record,  rather  than  part  of  the 
carriage  return. 

The  third  problem  is  tnat  of  the  sorting  routine  used, 
a simple  merge  sort  with  no  in  core  sorting.  This  routine 


was  used  in  hopes  that  at  a later  time  some  way  might  be 
iounu  to  nook  up  with  the  DLC  10  sorting  routine.  To  date 
no  method  for  doing  this  has  been  found,  so  the  program 
still  nas  its  original  and  very  inefficient  (due  to  I/O) 
sorting  routine. 

further  improvements  that  might  be  desirable  are 

1)  A more  elaborate  user  interface  such  as  the  one 
illustrated  in  Dr.’s  Morgan  & Buneman’s  paper. 

2)  Have  the  program  consult  a dictionary  tor  tne  information 
which  the  user  provides  in  the  field  directory. 

3)  Allow  substitution  for  character  sets  in  tne  record  field 
values  rather  than  just  single  characters. 

4)  Allow  the  user  to  specify  a record  file  and  a set  of 
logic  conditionsto  test  the  records  with  for  inclusion  in 
the  assertion  generation,  rather  than  providing  a previously 
screened  input  file. 

5)  Have  the  program  do  decoding  of  information  in  the 
rectangular  file. 

A copy  of  the  current  code  (the  program  is  written  in 
ALGOL)  is  availaole  from  the  Dept,  of  Decision  Sciences  at 
the  University  of  Pennsylvania. 
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Appendix  A1 

The  iollowing  are  two  sample  runs.  tach  sample  snows 
the  input,  then  the  prompting  session  and  then  the  output. 
In  the  prompting  sessions  the  computer  typed  portion  is 
shown  in  capital  letters  and  tne  user  portion  is  shown  in 
small  letters  (note:  in  actual  usage  the  user  answers  could 
he  in  capital  letters). 


input  no.  1 

text  (tile:  text. oat) 


Page  2 


f 


(keCTYpE  PERSON  NAME  AGE  RESIDENCE) % 

(rECTYPE  LOCATION  CITY  STATE ) % 

(GENCUNS  PERSON  INANE  ! AGE  !C1TY)% 

(oencons  Location  jcity  !Siate)% 

Punctuation  (tile:  punc.dat) 

VARIAbEE  DELIMITER  "!" 

sentence  delimiter 

PUNCTUATIUN ) " 

DELETE  UN  BLANK  FIELD  "NO"  REPLACE  WITH  "UNDEF" 
SUBSTITUTION  CHARACTERS 
/ " 

Records  (tile:  rec.dat) 

JOHN 

MARY  JANE 
ROBERT 
JILL 
JEFFERY 

Record  tield  directory  (lile:  lld.dat) 


jy 

•'  NAME" 

1 

ly 

"AGE" 

2U 

21 

"CITY" 

22 

34 

" STATE “ 

3b 

jy 

19BOSTUN 

MA 

22BAETIMORE 

MD 

24BOSTON 

MA 

21NEW  YORK 

C1TYNY 

PHILADELPHIA  PA 


Prompting  session  1 


ARE  ALL  INPUT  PILES  ON  DISK  AND  ARE  THEY  NAMED: 
RECORD  PILE  = REC.DAT,  TEXT  PILE  = TEXT . DAT 
PI ELD  DIRECTORY  PILE  = FLD.DAT,  PUNCTUATION  PILE 
PUNC.DAT 

ANSWER  "YES"  OR  "NO",  INCLUDE  QUOTES 
"yes" 

ON  WHAT  DEVICE  IS  THE  OUTPUT  TO  BE  PLACED? 
ANSWER:  EG  "DSK" , INCLUDE  QUOTES 
"dsk  " 

WHAT  IS  THE  NAME  OF  THE  OUTPUT  FILE  TO  BE? 
ANSWER:  EG  "FIL.EXT",  INCLUDE  QUOTES 
"tinl.dat" 

ARE  DUPLICATE  ASSERTIONS  TO  BE  DELETED? 

ANSWER  "YES"  OR  “NO",  INLUDE  QUOTES 
"yes  " 
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Output  no.  1 (file:  tinl.dat) 


(RECTYPE 
( RECTYPE 
(GENCONS 
(GENCONS 
(GENCONS 
(GENCONS 
(GENCONS 
(GENCONS 
(GENCONS 
(GENCONS 
(GENCONS 


PERSON  NAME  AGE  RESIDENCE) 

LOCATION  CITY  STATE) 

PERSON  JOHN  IS  BOSTON) 

LOCATION  BOSTON  MA) 

PERSON  MARY/  JANE  22  BALTIMORE) 
LOCATION  BALTIMORE  MD) 

PERSON  ROBERT  24  BOSTON) 

PERSON  JILL  21  NEW/  YORK/  CITY) 
LOCATION  NEW/  YORK/  CITY  NY) 

PERSON  JEEEERY  UNUEE  PHILADELPHIA) 
LOCATION  PHILADELPHIA  PA) 


■ 
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Input  no.  2 

Text  (tile:  assert.dat) 

STUDENT: =RELATION? 

MAJOR: =RELAT ION? 

{.FIRST-NAME  & LAST-NAME : =NAME? 

{.COURSE:  =NAME? 

& FIRST-NAME  {.LAST-NAME  IS  A STUDENT  OF  {.COURSE . $ 
& MAJOR  IS  THE  MAJOR  OF  {.FIRST-NAME  {.LAST-NAME.? 

Punctuation  (file:  puncl.dat) 

VARIABLE  DELIMITER  " {.  " 

SENTENCE  DELIMITER  "?" 

PUNCTUATION  “ : 

DELETE  ON  BLANK  FIELD  "YES" 

Records  (tile:  record.dat) 

JOHN  SMITH  IS201ACCOUNT ING 

TOM  JONES  IS 201 MATH 

ARTHUR  BURNS  IS222FINANCE 

Record  tield  directory  (tile:  tield.dat) 


50 

"FIRST-NAME" 

1 

15 

" LAST-NAME" 

16 

30 

"COURSE" 

31 

35 

"MAJOR" 

36 

50 

Prompting  session  no.  2 


ARE  ALL  INPUT  FILES  ON  DISK  AND  ARE  THEY  NAMED: 
RECORD  FILE  = REC.DAT,  TEXT  FILE  = TEXT . DAT 
FIELD  DIRECTORY  FILE  = FLD.DAT,  PUNCTUATION  FILE 
PUNC. DAT 

ANSWER  "YES"  OR  "NO",  INCLUDE  QUOTES 
" no  " 

ON  WHAT  DEVICE  IS  THE  PUNCTUATION  FILE? 

ANSWER:  EG  "DSK" , INCLUDE  QUOTES 
"dsk" 

WHAT  IS  THE  NAME  OF  THE  PUNCTUATION  FILE? 

ANSWER:  EG  "FIL.EXT",  INCLUDE  QUOTES 
"puncl .dat" 

ON  WHAT  DEVICE  IS  ThE  TEXT  FILE? 

ANSWER:  EG  "DSK",  INCLUDE  QUOTES 
"dsk  " 

WHAT  IS  THE  NAME  OF  THE  TEXT  FILE? 

ANSWER:  EG  "FIL.EXT",  INCLUDE  QUOTES 
"assert.dat" 

ON  WHAT  DEVICE  IS  THE  FIELD  DIRECTORY  FILE? 
ANSWER:  EG  "DSK",  INCLUDE  QUOTES 
"dsk  " 

WHAT  IS  THE  NAME  OF  THE  FIELD  DIRECTORY  FILE? 
ANSWER:  EG  "FIL.EXT",  INCLUDE  QUOTES 
"field.dat" 

ON  WHAT  DEVICE  IS  THE  RECORD  FILE? 

ANSWER:  EG  "DSK",  INCLUDE  QUOTES 
"dsk  " 

WHAT  IS  THE  NAME  OF  THE  RECORD  FILE? 

ANSWER:  EG  "FIL.EXT",  INCLUDE  QUOTES 
"record.dat" 

ON  WHAT  DEVICE  IS  THE  OUTPUT  TO  BE  PLACED? 

ANSWER:  EG  "DSK",  INCLUDE  QUOTES 

"dsk" 

WHAT  IS  THE  NAME  OF  THE  OUTPUT  FILE  TO  BE? 
ANSWER:  EG  "FIL.EXT",  INCLUDE  QUOTES 
" fin2.dat" 

ARE  DUPLICATE  ASSERTIONS  TO  BE  DELETED? 

ANSWER  "YES"  OR  "NO",  INLUDE  QUOTES 
"yes" 
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Output  no.  2 (file:  fin2.dat) 


STUDENT: =KELAT ION 
MAJOR: =RELAT ION 
JOHN  SMITH :=NAME 
IS2U1 : =NAME 

JOHN  SMITH  IS  A STUDENT  Of  IS201. 
ACCOUNTING  IS  THE  MAJOR  OP  JOHN  SMITH. 
TOM  JONES :=NAhE 

TOM  JONES  IS  A STUDENT  OF  IS201. 

MATH  IS  THE  MAJOR  OF  TOM  JONES. 

ARTHUR  BURNS :=NAME 
IS222 : =NAME 

ARTHUR  BURNS  IS  A STUDENT  OF  IS222. 
FINANCE  IS  THE  MAJOR  OF  ARTHUR  BURNS. 
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